cycle length
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
- Information Technology > Data Science (0.70)
Direct Multi-Token Decoding
Luo, Xuan, Wang, Weizhi, Yan, Xifeng
Decoder-only transformers have become the standard architecture for large language models (LLMs) due to their strong performance. Recent studies suggest that, in pre-trained LLMs, early, middle, and late layers may serve distinct roles: Early layers focus on understanding the input context, middle layers handle task-specific processing, and late layers convert abstract representations into output tokens. We hypothesize that once representations have been processed by the early and middle layers, the resulting hidden states may encapsulate sufficient information to support the generation of multiple tokens using only the late layers, eliminating the need to repeatedly traverse the early and middle layers. We refer to this inference paradigm as Direct Multi-Token Decoding (DMTD). Unlike speculative decoding, our method introduces no additional parameters, auxiliary routines, or post-generation verification. Despite being trained on a limited dataset, a fine-tuned DMTD Qwen3-4B model has already demonstrated promising results, achieving up to a 2x speedup with only minor performance loss. Moreover, as shown in our scaling analysis, its performance is expected to further improve with larger training datasets.
CycleNet: Enhancing Time Series Forecasting through Modeling Periodic Patterns
The stable periodic patterns present in time series data serve as the foundation for conducting long-horizon forecasts. In this paper, we pioneer the exploration of explicitly modeling this periodicity to enhance the performance of models in long-term time series forecasting (L TSF) tasks. Specifically, we introduce the Residual Cycle Forecasting (RCF) technique, which utilizes learnable recurrent cycles to model the inherent periodic patterns within sequences, and then performs predictions on the residual components of the modeled cycles.
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
- Information Technology > Data Science > Data Mining (0.72)
Deep Reinforcement Learning-based Cell DTX/DRX Configuration for Network Energy Saving
Mao, Wei, Wei, Lili, Semiari, Omid, Yeh, Shu-ping, Nikopour, Hosein
3GPP Release 18 cell discontinuous transmission and reception (cell DTX/DRX) is an important new network energy saving feature for 5G. As a time-domain technique, it periodically aggregates the user data transmissions in a given duration of time when the traffic load is not heavy, so that the remaining time can be kept silent and advanced sleep modes (ASM) can be enabled to shut down more radio components and save more energy for the cell. However, inevitably the packet delay is increased, as during the silent period no transmission is allowed. In this paper we study how to configure cell DTX/DRX to optimally balance energy saving and packet delay, so that for delay-sensitive traffic maximum energy saving can be achieved while the degradation of quality of service (QoS) is minimized. As the optimal configuration can be different for different network and traffic conditions, the problem is complex and we resort to deep reinforcement learning (DRL) framework to train an AI agent to solve it. Through careful design of 1) the learning algorithm, which implements a deep Q-network (DQN) on a contextual bandit (CB) model, and 2) the reward function, which utilizes a smooth approximation of a theoretically optimal but discontinuous reward function, we are able to train an AI agent that always tries to select the best possible Cell DTX/DRX configuration under any network and traffic conditions. Simulation results show that compared to the case when cell DTX/DRX is not used, our agent can achieve up to ~45% energy saving depending on the traffic load scenario, while always maintaining no more than ~1% QoS degradation.
- North America > United States (0.05)
- Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
- Telecommunications (0.94)
- Energy (0.68)
- Consumer Products & Services > Travel (0.54)
- Information Technology > Security & Privacy (0.34)
Traffic Signal Phase and Timing Estimation with Large-Scale Floating Car Data
Liao, Mingcheng, Feng, Zebang, Fan, Miao, Xu, Shengtong, Xiong, Haoyi
Effective modern transportation systems depend critically on accurate Signal Phase and Timing (SPaT) estimation. However, acquiring ground-truth SPaT information faces significant hurdles due to communication challenges with transportation departments and signal installers. As a result, Floating Car Data (FCD) has become the primary source for large-scale SPaT analyses. Current FCD approaches often simplify the problem by assuming fixed schedules and basic intersection designs for specific times and locations. These methods fail to account for periodic signal changes, diverse intersection structures, and the inherent limitations of real-world data, thus lacking a comprehensive framework that is universally applicable. Addressing this limitation, we propose an industrial-grade FCD analysis suite that manages the entire process, from initial data preprocessing to final SPaT estimation. Our approach estimates signal phases, identifies time-of-day (TOD) periods, and determines the durations of red and green lights. The framework's notable stability and robustness across diverse conditions, regardless of road geometry, is a key feature. Furthermore, we provide a cleaned, de-identified FCD dataset and supporting parameters to facilitate future research. Currently operational within our navigation platform, the system analyses over 15 million FCD records daily, supporting over two million traffic signals in mainland China, with more than 75\% of estimations demonstrating less than five seconds of error.
- Transportation > Infrastructure & Services (1.00)
- Transportation > Ground > Road (1.00)
LightGTS: A Lightweight General Time Series Forecasting Model
Wang, Yihang, Qiu, Yuying, Chen, Peng, Shu, Yang, Rao, Zhongwen, Pan, Lujia, Yang, Bin, Guo, Chenjuan
Existing works on general time series forecasting build foundation models with heavy model parameters through large-scale multi-source pre-training. These models achieve superior generalization ability across various datasets at the cost of significant computational burdens and limitations in resource-constrained scenarios. This paper introduces LightGTS, a lightweight general time series forecasting model designed from the perspective of consistent periodical modeling. To handle diverse scales and intrinsic periods in multi-source pre-training, we introduce Periodical Tokenization, which extracts consistent periodic patterns across different datasets with varying scales. To better utilize the periodicity in the decoding process, we further introduce Periodical Parallel Decoding, which leverages historical tokens to improve forecasting. Based on the two techniques above which fully leverage the inductive bias of periods inherent in time series, LightGTS uses a lightweight model to achieve outstanding performance on general time series forecasting. It achieves state-of-the-art forecasting performance on 9 real-world benchmarks in both zero-shot and full-shot settings with much better efficiency compared with existing time series foundation models.
- Pacific Ocean > North Pacific Ocean > San Francisco Bay (0.04)
- North America > United States > California > Santa Clara County > Stanford (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- (7 more...)
- Energy > Renewable (0.46)
- Energy > Power Industry (0.46)
TGDT: A Temporal Graph-based Digital Twin for Urban Traffic Corridors
Yousefzadeh, Nooshin, Sengupta, Rahul, Dilmore, Jeremy, Ranka, Sanjay
TGDT takes input parameters (highlighted in orange), including ingress aggregated traffic waveforms, signal timing parameters (e.g., cycle length, offset, and maximum green duration for each phase), driving behavior parameters (e.g., speed, acceleration, space cushion, lane-changing behavior), turning movement ratios, and the distances between intersections along the major corridor. It simultaneously generates multiple outputs (highlighted in blue), such as westbound travel times along the corridor, queue lengths for each lane group phase, and average waiting times for each lane group phase. The time intervals of the output time series match those of the input inflow waveforms.Figure 2: Overview of TGDT framework. This diagram illustrates the architecture of our proposed Digital Twin for urban corridors, which consists of three main modules. Simulation records, extracted from the logs of a microscopic traffic simulator, are transformed into graph-structured data that uniquely represent the corridor's traffic state for each scenario. The inflow module ( M inf) performs a graph imputation task to reconstruct 2D traffic volumes on every intermediate road segment. The travel time module ( M tt) carries out a graph regression task to estimate bidirectional corridor-level travel time series. Finally, the queue length ( M ql) and waiting time ( M wt) modules apply temporal convolution and deconvolution operations on the spatiotemporal representations learned by M tt, producing 3D outputs for maximum queue length and waiting time. These estimates are generated at the intersection and phase levels for each lane group associated with a specific movement phase. is then passed through additional layers that combine con-volutional and transposed convolutional operations for multi-channel temporal feature learning from 1D multivariate inputs.
- North America > United States > Florida > Alachua County > Gainesville (0.14)
- North America > United States > Texas (0.04)
- Transportation > Infrastructure & Services (1.00)
- Transportation > Ground > Road (0.89)
Dynamics of Structured Complex-Valued Hopfield Neural Networks
Garimella, Rama Murthy, Valle, Marcos Eduardo, Vieira, Guilherme, Rayala, Anil, Munugoti, Dileep
In this paper, we explore the dynamics of structured complex-valued Hopfield neural networks (CvHNNs), which arise when the synaptic weight matrix possesses specific structural properties. We begin by analyzing CvHNNs with a Hermitian synaptic weight matrix and establish the existence of four-cycle dynamics in CvHNNs with skew-Hermitian weight matrices operating synchronously. Furthermore, we introduce two new classes of complex-valued matrices: braided Hermitian and braided skew-Hermitian matrices. We demonstrate that CvHNNs utilizing these matrix types exhibit cycles of length eight when operating in full parallel update mode. Finally, we conduct extensive computational experiments on synchronous CvHNNs, exploring other synaptic weight matrix structures. This work was supported in part by the National Council for Scientific and Technological Development (CNPq) under grant no 315820/2021-7, the S ao Paulo Research Foundation (FAPESP) under grant no 2023/03368-0, and the Postdoctoral Researcher Program (PPPD) at the Universidade Estadual de Campinas (UNICAMP). Keywords-- Hopfield neural network, complex-valued neural network, associative memory, braided Hermitian matrix. 1 Introduction Artificial neural networks have been conceived as emulators of the biological neural network synapse process. Their processing units, the artificial neurons, usually act based on input signals received from other neurons or cells. Like a biological neuron firing an electric impulse in the presence of specific chemical components in appropriate concentrations, an artificial neuron fires when certain mathematical conditions are satisfied.
Jointly Assigning Processes to Machines and Generating Plans for Autonomous Mobile Robots in a Smart Factory
Leet, Christopher, Sciortino, Aidan, Koenig, Sven
-- A modern smart factory runs a manufacturing procedure using a collection of programmable machines. Typically, materials are ferried between these machines using a team of mobile robots. T o embed a manufacturing procedure in a smart factory, a factory operator must a) assign its processes to the smart factory's machines and b) determine how agents should carry materials between machines. Existing smart factory management systems solve the aforementioned problems sequentially, limiting the throughput that they can achieve. In this paper we introduce ACES, the Anytime Cyclic Embedding Solver, the first solver which jointly optimizes the assignment of processes to machines and the assignment of paths to agents. We evaluate ACES and show that it can scale to real industrial scenarios. I. INTRODUCTION Modern smart factories are designed to enable flexible manufacturing [1]. A flexible manufacturing system is a system which can produce a variety of different products with minimal reconfiguration [2]. Flexibility can improve a manufacturer's ability to customize products, reduce the time that it takes to fulfill new orders, and lower the costs of producing a new product. To permit flexible manufacturing, a smart factory needs the following two components: 1) Flexible Machines. Flexible machines are general-purpose machines such as CNC machines which can be programmed to carry out a range of manufacturing processes [4].
- North America > United States > California (0.14)
- Europe (0.04)
The Surprising Agreement Between Convex Optimization Theory and Learning-Rate Scheduling for Large Model Training
Schaipp, Fabian, Hägele, Alexander, Taylor, Adrien, Simsekli, Umut, Bach, Francis
We show that learning-rate schedules for large model training behave surprisingly similar to a performance bound from non-smooth convex optimization theory. We provide a bound for the constant schedule with linear cooldown; in particular, the practical benefit of cooldown is reflected in the bound due to the absence of logarithmic terms. Further, we show that this surprisingly close match between optimization theory and practice can be exploited for learning-rate tuning: we achieve noticeable improvements for training 124M and 210M Llama-type models by (i) extending the schedule for continued training with optimal learning-rate, and (ii) transferring the optimal learning-rate across schedules.
- North America > United States > New York (0.04)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- Europe > Switzerland > Vaud > Lausanne (0.04)
- (2 more...)